Learning Step Size Controllers for Robust Neural Network Training

نویسندگان

  • Christian Daniel
  • Jonathan Taylor
  • Sebastian Nowozin
چکیده

This paper investigates algorithms to automatically adapt the learning rate of neural networks (NNs). Starting with stochastic gradient descent, a large variety of learning methods has been proposed for the NN setting. However, these methods are usually sensitive to the initial learning rate which has to be chosen by the experimenter. We investigate several features and show how an adaptive controller can adjust the learning rate without prior knowledge of the learning problem at hand. Introduction Due to the recent successes of Neural Networks for tasks such as image classification (Krizhevsky, Sutskever, and Hinton 2012) and speech recognition (Hinton et al. 2012), the underlying gradient descent methods used for training have gained a renewed interest by the research community. Adding to the well known stochastic gradient descent and RMSprop methods (Tieleman and Hinton 2012), several new gradient based methods such as Adagrad (Duchi, Hazan, and Singer 2011) or Adadelta (Zeiler 2012) have been proposed. However, most of the proposed methods rely heavily on a good choice of an initial learning rate. Compounding this issue is the fact that the range of good learning rates for one problem is often small compared to the range of good learning rates across different problems, i.e., even an experienced experimenter often has to manually search for good problem-specific learning rates. A tempting alternative to manually searching for a good learning rate would be to learn a control policy that automatically adjusts the learning rate without further intervention using, for example, reinforcement learning techniques (Sutton and Barto 1998). Unfortunately, the success of learning such a controller from data is likely to depend heavily on the features made available to the learning algorithm. A wide array of reinforcement learning literature has shown the importance of good features in tasks ranging from Tetris (Thiery and Scherrer 2009) to haptile object identification (Kroemer, Lampert, and Peters 2011). Thus, the first step towards applying RL methods to control learning rates is to find good features. Subsequently, the main contributions of this paper are Copyright c © 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. • Identifying informative features for the automatic control of the learning rate. • Proposing a learning setup for a controller that automatically adapts the step size of NN training algorithms. • Showing that the resulting controller generalizes across different tasks and architectures. Together, these contributions enable robust and efficient training of NNs without the need of manual step size tuning. Method The goal of this paper is to develop an adaptive controller for the learning rate used in training algorithms such as Stochastic Gradient Descent (SGD) or RMSprop (Tieleman and Hinton 2012). We start with a general statement of the problem we are aiming to solve. Problem Statement We are interested in finding the minimizer ω∗ = arg min ω F (X;ω), (1) where in our case ω represents the weight vector of the NN and X = {x1, . . . ,xN} is the set of N training examples (e.g., images and labels). The function F (·) sums over the function values induced by the individual inputs such that

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Differential Evolution and Spatial Distribution based Local Search for Training Fuzzy Wavelet Neural Network

Abstract   Many parameter-tuning algorithms have been proposed for training Fuzzy Wavelet Neural Networks (FWNNs). Absence of appropriate structure, convergence to local optima and low speed in learning algorithms are deficiencies of FWNNs in previous studies. In this paper, a Memetic Algorithm (MA) is introduced to train FWNN for addressing aforementioned learning lacks. Differential Evolution...

متن کامل

Reliability-Based Robust Multi-Objective Optimization of Friction Stir Welding Lap Joint AA1100 Plates

The current paper presents a robust optimum design of friction stir welding (FSW) lap joint AA1100 aluminum alloy sheets using Monte Carlo simulation, NSGA-II and neural network. First, to find the relation between the inputs and outputs a perceptron neural network model was obtained. In this way, results of thirty friction stir welding tests are used for training and testing the neural network...

متن کامل

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

Application of ANN Technique for Interconnected Power System Load Frequency Control (RESEARCH NOTE)

This paper describes an application of Artificial Neural Networks (ANN) to Load Frequency Control (LFC) of nonlinear power systems. Power systems, such as other industrial processes, have parametric uncertainties that for controller design had to take the uncertainties in to account. For this reason, in the design of LFC controller the idea of robust control theories are being used. To improve ...

متن کامل

Wavelet Neural Network with Random Wavelet Function Parameters

The training algorithm of Wavelet Neural Networks (WNN) is a bottleneck which impacts on the accuracy of the final WNN model. Several methods have been proposed for training the WNNs. From the perspective of our research, most of these algorithms are iterative and need to adjust all the parameters of WNN. This paper proposes a one-step learning method which changes the weights between hidden la...

متن کامل

Saturated Neural Adaptive Robust Output Feedback Control of Robot Manipulators:An Experimental Comparative Study

In this study, an observer-based tracking controller is proposed and evaluatedexperimentally to solve the trajectory tracking problem of robotic manipulators with the torque saturationin the presence of model uncertainties and external disturbances. In comparison with the state-of-the-artobserver-based controllers in the literature, this paper introduces a saturated observer-based controllerbas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016